AUTONOMOUS FAIL-OVER TO HOT-SPARE PROCESSOR USING SMI 



BACKGROUND OF THE INVENTION 
Technical Field: 

[0001] The present invention relates generally to data processing systems and in particular to 
system response to processor failure in a multi-processor data processing system. Still more 
particularly, the present invention relates to a method and system for dynamically activating a 
spare processor when processor failure is detected in a multi-processor data processing system. 

Description of the Related Art: 

[0002] Conventional data processing systems are often configured with multiple processors 
interconnected to each other and other system components. These multiple processors may exist 
on a single chip or may be manufactured on separate chips. The processors operate in tandem to 
efficiently complete tasks associated with application code being executed. Those skilled in the 
art are familiar with the various configuration and operations of multiprocessor systems (MPs). 

[0003] Occasionally, while a system is running and the processors are executing application 
code, one (or more) of the processors may fail (i.e., begin to provide inaccurate processing 
results and/or begin operating outside of pre-established or "ideal" operating parameters, etc.). 
When such processor failure occurs, the system's technician has to replace the failing processor 
(or processor chip) with a new one in order to maintain the level of processing desired for the 
system. This changing of processors is normally a manual operation, which requires the 
technician to halt all executing processes (across the system including the operating system 
(OS)), shut down the MP, obtain a new/replacement processor, complete the switch out of the 
failing processors, reboot the system, and then restart the executing processes across the MP. 

[0004] During the system reboot, the replacement processor is recognized by the MP's BIOS 
(basic input/output system) and activated for operation within the system. Conventionally, the 
failed (or failing) processor is physically detached (or removed) from the system bus (or 
interconnect), and the replacement processor is connected (plugged-in) to the interconnect in 



place of the removed processor. This replacement method is convenient when non-critical 
processes are being completed on the MP; however, the time required to replace the processor 
and downtime in processing is un-acceptable for critical processes that require continuous up- 
time of the MP. 

[0005] Also, with current replacement methods, a separate replacement processor is required 
to be plugged-in after the failure condition is detected. This requires a technician to swap out the 
failed processor with the replacement, and as described above, the OS and executing processes 
are halted until the swap of the processors is completed. 

[0006] The traditional method of responding to processor failure severely limits the ability of 
larger systems (e.g., multiprocessor server systems) with non-failing processors to continue 
executing despite the presence of the failing processor. In lager server systems that require 
continuous up-time, replacement of a failing processor has to be completed without shutting 
down the entire system. Typically, in a server system, when a processor begins to fail, the 
processor must be taken out of the processing pipeline and replaced by another processor to 
avoid the entire MP crashing. Depending on the built-in redundancy and complexity of the MP, 
such a temporary removal may have wide ranging effects, from slightly degrading the overall 
performance of the MP to temporarily removing the MP from service. 

[0007] Currently, manufacturers of server systems provide different types of server 
architectures, with common architectures being the S/390 architecture and the Intel Architecture- 
32 bit (IA-32). The S/390 architecture has a machine instruction for switching to a backup 
processor, while IA-32 does not have a similar machine instruction. Rather, IA-32 is designed 
with the functionality to generate an SMI (Systems Management Interrupt) after a CPU fault. 
Generation of SMIs for standard system management tasks is unique to the IA-32 and the 
process is described in detailed in U.S. Patent No. 6,625, 679. 

[0008] Realizing that shutting down the entire MP and then restarting all processes is an 
unacceptable method of handling single processor failures, manufacturers designed some 
conventional M Ps w ith a failure response mechanism that involves a hot-spare processor and 
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hardware changes to the system architecture to support processor failure conditions. 
Implementation of the failure response mechanism is based on the type of server architecture. 

[0009] United States Patent No. 6,115,829 provides a hot spare processor for solving 
processor problems relating to the S/390 architecture. The solution involves the utilization of a 
hardware instruction built into the processor that is usable only by millicode. Since the problems 
in the S/390 architecture are specific to that architecture, the above solution is not available to 
the IA-32 architecture, which has a different processor configuration and exhibits a different set 
of processor problems. Those skilled in the art are familiar with the functional and architectural 
differences in the two types of architectures and appreciate that different response methods 
unique to each architecture must be implemented for processor failure. 

[0010] As another example, U.S. Patent 4,819,232 provides a hardware instruction for 
software programs to utilize when completing fault recovery to a spare processor from the 
primary processor. Implementation of this process in an IA-32 system would require 
architecture (hardware) changes to current IA-32. Another patent, No. 5,155,729 provides a 
redundant processor that engages in a ping-ponging process with the primary processor during a 
hot swap condition. This process is also specific to the S/390 architecture and not provided 
within the IA-32 architecture. Finally, U.S. Patent 6,370,657 places the system into standby 
prior to hot-swapping the processors; however, the response mechanism does not provide a hot- 
spare nor does it provide a means to keep the OS running during the switch between processors. 

[0011] With server systems, processor downtime is an undesirable condition, and thus a fast 
fault-response mechanism/scheme is required. It would be desirable for such a scheme to 
include the ability to detect when one of the multiple processors has failed. Additionally, when a 
processor failure has been detected, it would also be desirable for the fault-response mechanism 
to quickly respond to the failure by providing a replacement processor without the system having 
to suspend processing and with minimal disruption to the overall system. The present invention 
recognizes that it would be desirable to provide a processor failure response mechanism that 
provides a replacement processor in a seamless manner so that executing processes and the OS 
continue executing during dynamic replacement of the failed processor. 
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SUMMARY OF THE INVENTION 

[0012] Disclosed is a method and system for dynamically replacing a failing processor in a 
server system configured with IA-32 architecture without requiring hardware changes to the IA- 
32 architecture or administrative effort. The method enables a system to remain operational, 
i.e., continue executing operating system (OS) processes and application processes on the other 
processors, while the failing processor is automatically de-activated and a replacement processor 
is dynamically activated. Negligible system downtime occurs during the transfer of executing 
threads/processes to the replacement processor, unlike other methods which require temporary 
halting of the OS processing and, in some instances, halting all other processes across the 
system. 

[0013] Implementation of the invention involves the utilization of Systems Management 
Interrupt (SMI) functionality provided within basic input output system (BIOS) of the IA-32 
architecture. At least one processor of the multiprocessor system (MP) is initially provided as a 
reserve (or hot-spare) processor that remains in an idle, off, or low-power mode. While in that 
mode, the OS is prevented from initially utilizing the hot-spare processor. When a processor 
failure is detected, SMI code running on a good processor instructs the OS to hold off allocating 
processes to the failing processor. Contemporaneously, the SMI (and OS) activates and 
completes an initialization of the hot-spare processor to prepare it to begin receiving the held-off 
processes. Control is then returned to the OS, which updates the "active" processor list and 
allocates the threads that were running on the failing processor to the hot-spare processor. 

[0014] Thus, a processor is able to autonomically fail-over to a hot-spare processor without 
affecting the OS or executing applications. The invention substantially eliminates the problems 
related to high availability during single processor failure without crashing all of the processors. 
Further, the invention completes this processor replacement without requiring specialized OS or 
middleware features. Finally, the invention provides a method for dynamic, hot swap of spare 
processors without the need of processor architecture changes. That is, no special hardware 
instruction is built into the processor, as in other methods. 



RPS920030082US1 



-4- 



[0015] The above as well as additional objects, features, and advantages of the present 
invention will become apparent in the following detailed written description. 
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BRIEF DESCRIPTION OF THE DRAWINGS 



[0016] The novel features believed characteristic of the invention are set forth in the 
appended claims. The invention itself however, as well as a preferred mode of use, further 
objects and advantages thereof, will best be understood by reference to the following detailed 
description of an illustrative embodiment when read in conjunction with the accompanying 
drawings, wherein: 

[0017] Figure 1 is a block diagram of a multiprocessor computer system (MP) designed 
according to IA-32 architecture and a spare processor according to one illustrative embodiment 
of the present invention; 

[0018] Figure 2A is a block diagram depicting two major internal components of one of the 
IA-32 processors of Figure 1A, according to one illustrative embodiment of the present 
invention; 

[0019] Figure 2B is a block diagram of a multiprocessor chip designed with dual IA-32 
processors and a s pare IA-32 p rocessor t o e nable d ynamic s witching o ut o f a filed p rocessor 
during failure response in accordance with one embodiment of the present invention; and 

[0020] Figure 3 is a flow chart illustrating the process of detecting and responding to 
processor failure with the spare processor configuration of Figures 1 and 2B according to one 
embodiment of the present invention. 
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DETAILED DESCRIPTION OF ILLUSTRATIVE EMBODIMENT(S) 



[0021] The present invention provides a method for dynamically replacing failing processors 
in a IA-32 configured multiprocessor computer system (MP) by providing the MP with a spare 
processor t hat i s i nitially held-off and i s 1 ater b rought i nto t he group o f e xecuting p rocessors 
whenever one of the main processors is detected as failing. The invention provides a 
substantially seamless transfer of executing processes from the failing processor to the 
replacement processor, without halting the execution of the operating system (OS) and/or 
applications on the overall MP. That is, a failing processor is able to autonomically fail-over to a 
hot-spare processor without affecting the OS or executing applications. The failure response 
method of the invention is completed without requiring special hardware instructions built into 
the processors. 

[0022] With reference now to the figures and in particular to Figure 1, there is illustrated a 
multiprocessor system (MP) designed with a spare processor and other components that enable 
the implementation of the various features of the invention. MP 100 is designed according to 
IA-32 architecture and comprises an IA-32 processor system 120 coupled to system components 
by a bridge 112. The IA-32 processor system 11 may include one or more processors 120, 122 
that perform various computing functions. The processors 14 are coupled to a common bus 114, 
and may be coupled to a local advanced programmable interrupt controller (APIC) bus 
(described below). The processors 120, 122 share access to the common bus 114, and may also 
share other resources such as memory, input/output (I/O) devices, and interrupt handlers, for 
example. The system components provide enhanced functionality, and can be used with the 
existing IA-32 processors 120, 122, provided that appropriate hardware and/or software is used 
to ensure compatibility between the IA-32 processor system and the system components 

[0023] MP 100 also comprises a spare processor 122. The processors are interconnected to 
each other via processor interconnect 114. Each processor 120, 122 comprises service processor 
logic (not shown), which communicates with other service processors of other processors as well 
as bus/interconnect logic (e.g., bridge 112) to allow the processors 120, 122 to operate in a 
coherent manner with each other and the rest of the MP 100. 
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[0024] As shown by Figure 2 A, each processor 120, 122 includes a core unit 140 that 
controls processing on the processor 120, 122 and an APIC 141 that processes external and 
normal interrupts that the processor 120, 122 may receive at its interrupt pins 143 and 144. 
APIC 141 also includes a connection to the APIC bus 18, used to deliver interrupts to the 
application processors. The interrupt pins 143 and 144 may be programmed to receive specific 
types of interrupts. 

[0025] The system components include one or more I/O controllers 105, which are coupled 
to a memory controller (MC) 107. The I/O controllers 105 are coupled to I/O device buses 109. 
The buses 109 may include a peripheral computer interface (PCI) bus, for example, the buses 
109 may connect standard computer peripherals such as a keyboard. 

[0026] Coupled toMC107is asy stem m emory 110, within w hich i s s tored s oftware for 
controlling the MP and executing application processes. These software include: the OS 113, 
BIOS 116, which includes SMI, and program applications 117. SMI notifies the OS of 
additional processor resources within the MP (i.e., increase/decrease in number of processors) as 
well as addition/removal of other system resources (i.e., memory and I/O, etc.). 

[0027] A system bus 110 connects the memory controller 107 and other components to IA- 
32 processors 120, 122, via bridges 112, and processor buses 114. The IA-32 processors 120, 
122 and the bridges 112 are grouped at nodes 111. 

[0028] The bridge 112 takes the interrupts that are being delivered over the system bus 110 
and signals the interrupt to the appropriate IA-32 processor 120, 122. To ensure that a particular 
interrupt reaches the appropriate IA-32 processor 120, 122, all interrupt transactions on the 
system bus 110 contain target node 111 and IA-32 processor identification. 

[0029] Software may initialize the I/O controllers 105 to transfer an interrupt transaction in a 
manner that identifies the IA-32 processor 120, 122 to be interrupted. The bridge 112 that is 
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connected to the same processor bus 114 as the destination IA-32 processor 120, 122 recognizes 
the interrupt transaction, and asserts an interrupt pin at the targeted IA-32 processor 120, 122. 

[0030] During operation, the OS and software code from the program application are 
executed by the processors to complete specific tasks or functions associated with the 
application. The processors are interconnected to the other components (memory, etc.) via 
system interconnect 110. Interconnect fabric 110 includes wires and control logic for routing 
communication between the components as well as controlling the response of the OS of MP 100 
to changes in the hardware configuration. In the present invention, BIOS comprises processor 
failure response logic 207, which generates the SMI, and configuration setting logic 209. 

[0031] Figure 2B provides a block diagram representation of a multiprocessor chip designed 
with main IA-32 processors 120 and a spare processor 122. Also provided with MP chip 200 is a 
register 209 for tracking active processors on MP chip 200, as described below. In one alternate 
implementation, processors 120, 122 are interconnected to each other via processor-to-processor 
buses 203. Processors 120, 122 are also connected to bridge (bus controller) 207 via individual 
control b uses 2 05. C ontrol b uses e nable t ransfer o f c ontrol d ata f rom t he b us c ontroller t hat 
indicates which processor is active within MP chip 200. Control bus then provides a connector 
211 that couples MP chip 200 to larger processing system such as MP 100 of Figure 1 via pins 
213. 

[0032] When the MP is initially activated for processing, one IA-32 processor is held off as a 
spare processor. Holding off the processor involves having the basic input output system (BIOS) 
recognize the processor as a spare during BIOS boot (i.e., power on self test (POST), etc. of the 
MP and having the BIOS respond by not enabling the spare processor to be initially allocated for 
OS and application processing. This particular set-up of the MP may require some 
administrative input and support. However, in one embodiment, the processor 
initialization/identification code includes a software tag to identify the processor as a spare 
processor to the OS and other components. The other processors are initiated and identified to 
the OS for regular processing. 
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[0033] In one embodiment, during boot, BIOS utilizes advanced configuration and power 
interface (ACPI) specification and/or the S3 state to put the processor in a low-power, standby, 
or "off state. The S3 state is a computer sleep state defined by the ACPI specification in which 
the system is not powered nor running, but the memory remains intact, powered and in a 
continuous refresh cycle. This initial removal of the spare processor from the group of executing 
processors occurs before the BIOS hands over control to the OS. In another embodiment the 
system is made to enable the "hot spare processor" feature on-the-fly by the OS with a driver- 
OS-SMI handshake. Tracking which processors are active may be completed by an software 
table of active processors, which is available to be read by the OS, when determining work load 
allocation to each of the active processors. 

[0034] Notably, the present invention maintains system uptime even while a processor in the 
MP is failing. This feature of the invention is particularly applicable to operations in a critical 
server environment that requires continuous uptime of the system. 

[0035] The spare processor remains hidden until one of the primary processors fails and the 
spare processor is needed to replace the failing processor. As illustrated in Figure 2B, the spare 
processor may be built on the same board as the main processors. However, in another 
implementation, the spare processor is provided on a separate processing board to make 
serviceability easier. For example, with the spare processor operation for Series x440 models, 
the bottom-most board with CPU's is identified as the ideal board on which to reserve a hot-spare 
CPU(s), since the bottom CPU board is not as accessible. This enables a failed CPU to be 
disabled while the OS remains running. 

[0036] The spare processor is interconnected similarly to each primary processor with the 
exception that BIOS does not initialize the spare nor initially present the spare processor to the 
OS following the POST. During POST, BIOS hides the spare processor from the OS. This is 
accomplished either through a setup of the MP table and/or an ACPI setup. The BIOS brings the 
spare processor all the way online, then puts it into a spin lock (i.e., a tight software loop in the 
SMI handler). This has the effect of completely hiding the processor from the OS until the spare 
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processor is needed. Once processor failure occurs and the spare processor is activated, the spare 
processor is then placed under complete OS control. 

[0037] Possible triggers for the failure-response features of the invention includes the 
following: (1) CPU thermal trends are observed by a system management utility, which may be 
hardware (HW), firmware (FW), and/or software (SW). In one implementation, the system 
manager utility also provides an SMI handler itself; (2) The CPU temperature goes over or near 
some operational breaking point; and (3) The system management entity causes an SMI to be 
invoked, indicating which CPU is to be off-loaded to the hot spare CPU. 

[0038] Implementation of the invention involves the utilization of Systems Management 
Interrupt (SMI) functionality provided within basic input output system (BIOS) of the IA-32 
systems. At least one processor of the multiprocessor system is reserved as a hot-spare processor 
that remains in an idle, off, or low-power mode. While in that mode, the OS is prevented from 
initially utilizing the hot-spare processor. When a processor failure is detected, SMI code 
running on a good processor instructs the OS to hold off allocating processes to the failing 
processor. Contemporaneously, the SMI (and OS) activates and completes an initialization of 
the hot-spare processor to prepare it to begin receiving the held-off processes. Control is then 
returned to the OS, which updates the "active" processor list and allocates the threads that were 
running on the failing processor to the hot-spare processor. 

[0039] The SMI is generated by hardware during various processor failure conditions. 
During an SMI, the system state of each processor is completely saved to memory and the caches 
and pipelines are flushed. A BIOS interrupt handler (i.e., the SMI handler) then takes control of 
the system until a resume (RSM) instruction is performed by the OS. The SMI handler is 
utilized to allow BIOS to swap out processors. The BIOS places the failing processor in a halt 
state and contemporaneously brings the standby processor out of the held-off state and into an 
active state. Additionally, the BIOS switches the memory save state from the old processor to 
the new processor. In one embodiment, the above processes are accomplished by providing a 
software register of active processors in the OS and setting the register to reflect the spare 
processor as active and the failing processor as inactive. 
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[0040] In t he p resent i nvention a lso, S MI a Hows B IOS t o d etect a f ailure a nd c ontrol t he 
reallocation o f p rocessing f rom a good C PU t o the h ot s pare C PU. T he S Mis are generated 
through hardware, and an internal monitoring mechanism is provided for each processor or group 
of processors that generates the SMI. In the described embodiment, the specific processor-fault 
SMI is generated when a processor is failing. The SMI identifies to the OS and the BIOS which 
one of the processors is failing. SMIs are invoked periodically and/or asynchronously by the 
hardware or other system events, and possibly by software control. A description of the 
functionality and utilization of the SMI and SMI handler is described in detail in United States 
Patent No. 6,463,492. 

[0041] Whenever a CPU fault occurs, an SMI is automatically generated by the existing 
hardware, as described in the above referenced patent. The SMI is utilized as a trigger that 
initiates the activation of the spare processor by the BIOS. Notably, unlike with other 
conventional methods, no hardware switching logic is needed to accomplish the fail-over process 
of the invention. That is, the invention provides a software-only solution for processor failure 
response. 

[0042] Thus, a processor is able to autonomically fail-over to a hot-spare processor without 
affecting the OS or executing applications. The invention substantially eliminates the problems 
related to high availability during single processor failure without crashing all of the processors. 
Further, the invention completes this solution in a way that also does not require specialized OS 
or middleware features. Finally, the invention provides a method for hot swap spare processors 
without the need of processor architecture changes. That is, no special hardware instruction is 
required built into the processor as in other methods. 

[0043] Figure 3 i s a flow chart illustrating the process of failure detection and response 
according to one implementation of the invention. For simplicity, the process is described with 
reference to the MP and processor configuration of Figures 1 and 2A, 2B. As provided at block 
301, the process begins with the BIOS setting up and installing the main processors, P0 and PI, 
during the initial POST of the MP. The presence of the spare processor is registered in the BIOS 
and that processor is initially left in power-down (held-off) or inactive mode. The OS and 
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applications are not provided with an indication of the spare processor, so no operations are 
initially allocated to that processor. The primary processors commence processing application 
code and OS operations on the MP, as shown at block 303. The system monitors the processors 
for a detection of failure on one of the processors, and a determination is made at block 305 
whether a failure is detected. In one implementation, the processors detect failure within their 
subsystem, generating a processor fault SMI, and message the failure to the SMI handle. If no 
failure occurs, the processing of application and OS continues as normal on the main processors. 

[0044] If failure is detected, however, the receipt of a processor fault SMI by the OS 
immediately activates a switch over response to replace the failing processor with the spare 
processor as indicated at block 307. The spare processor is brought out of the held-off state. 
Concurrently, processes (instructions, threads, etc.) initially allocated to the failing processor are 
held by the OS and processing on the failing processor is stopped as shown at block 309. Once 
the spare processor has been successfully brought on line, the BIOS (table) is updated and the 
OS, bus controllers, etc. are messaged to change their configurations to reflect a processor 
change from the failing processor to the spare processor as shown at block 311. The processes 
initially allocated to the failing processor are sent to the spare processor as indicated at block 
313, and the application code and OS continue processing on the processors that are now 
functional (i.e., the other main processor and the spare processor) as provided at block 315. 
Notably, for this seamless transfer to be possible, all processors in the system must be identical 
or process instructions in an identical manner. Otherwise a degradation or improvement in 
processing may become noticeable and affect overall throughput. Once the fail-over to the spare 
processor has been completed, a signal/message is generated indicating (to the administrator) 
that the processor has failed and is replaced with a spare, as shown at block 317. Then the 
process ends as depicted at block 319. 

[0045] The invention provides a method for hot swap spare processors without the need of 
processor architecture changes, i.e., without requiring a hardware instruction built into the 
processor. The invention provides high availability of processing resources and eliminates 
delays due to personal switching out failing processors manually. Further, the invention 
completes this solution in a way that also does not require specialized OS or middleware 
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features. 



[0046] The present invention provides several advantages over other available methods 
including: (1) no specialized software is needed because SMI is in BIOS, and the reaction to a 
failing processor is automatic. Also, (2) using an SMI, a service processor is able to identify a 
failing processor & halt it's execution from an outside perspective, unlike the limitation with a 
software implementation. Furthermore, (3) an alert about a hot-spare processor switching event 
can be generated. 

[0047] The other software methods do not utilize SMIs or SMI functionality and encounter 
several drawbacks, including: (1) customers must purchase the OS/software license that provides 
the hot-spare-processor feature, since not all OS support this feature; and (2) the OS has to 
makes the decision to switch to a hot spare processor may be running decision threads atop the 
faulting processor, leading to incorrect code execution. 

[0048] While the invention has been particularly shown and described with reference to a 
preferred embodiment, it will be understood by those skilled in the art that various changes in 
form and detail may be made therein without departing from the spirit and scope of the 
invention. 
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